further improving roles_is_member_of()

  • Jump to comment-1
    nathandbossart@gmail.com2024-04-12T04:16:46+00:00
    (moved to a new thread) On Thu, Mar 21, 2024 at 04:31:45PM -0400, Tom Lane wrote: > I wrote: >> ... I still see the problematic GRANT taking ~250ms, compared >> to 5ms in v15. roles_is_member_of is clearly on the hook for that. > > Ah: looks like that is mainly the fault of the list_append_unique_oid > calls in roles_is_member_of. That's also an O(N^2) cost of course, > though with a much smaller constant factor. > > I don't think we have any really cheap way to de-duplicate the role > OIDs, especially seeing that it has to be done on-the-fly within the > collection loop, and the order of roles_list is at least potentially > interesting. Not sure how to make further progress without a lot of > work. I looked at this one again because I suspect these "thousands of roles" cases are going to continue to appear. Specifically, I tried to convert the cached roles lists to hash tables to avoid the list_member_oid linear searches. That actually was pretty easy to do. The most complicated part is juggling a couple of lists to keep track of the roles we need to iterate over in roles_is_member_of(). AFAICT the only catch is that select_best_grantor() seems to depend on the ordering of roles_list so that it prefers a "closer" role. To deal with that, I added a "depth" field to the entry struct that can be used as a tiebreaker. I'm not certain that this is good enough, though. As shown in the attached work-in-progress patch, this actually ends up removing more code than it adds, and it seems to provide similar results to HEAD (using the benchmark from the previous thread [0]). I intend to test it with many more roles to see if it's better in more extreme cases. [0] https://postgr.es/m/341186.1711037256%40sss.pgh.pa.us -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
    • Jump to comment-1
      nathandbossart@gmail.com2024-04-12T19:19:37+00:00
      On Thu, Apr 11, 2024 at 11:16:33PM -0500, Nathan Bossart wrote: > As shown in the attached work-in-progress patch, this actually ends up > removing more code than it adds, and it seems to provide similar results to > HEAD (using the benchmark from the previous thread [0]). I intend to test > it with many more roles to see if it's better in more extreme cases. Even with 100K roles, the Bloom filter added in commit d365ae7 seems to do a pretty good job at keeping up with the hash table approach. The callers of roles_is_member_of() that do list_member_oid() on the returned list might benefit a little from a hash table, but I'm skeptical it makes much difference in practice. This was an interesting test, but I'll likely withdraw this patch shortly. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com